John Rauser of Pintrest (now Amazon), speaking at Strata + Hadoop 2014. https://blog.revolutionanalytics.com/2014/10/statistics-doesnt-have-to-be-that-hard.html
Logic of hypothesis tests
Choose a statistic that measures the effect
Construct the sampling distribution under \(H_0\) (almost always done using technical mathematics)
Locate the observed statistic in the null sampling distribution
p-value is the probability of the observed data or more extreme if the null hypothesis is true
Logic of permutation tests
Choose a test statistic
Shuffle the data (force the null hypothesis to be true)
Create a null sampling distribution of the test statistic (under \(H_0\)) (done using the computer, not calculus)
Find the observed test statistic on the null sampling distribution and compute the p-value (observed data or more extreme). The p-value can be one or two-sided.
Consider the NHANES dataset.
Income
(HHIncomeMid - Numerical version of HHIncome derived from the middle income in each category)
Health
(HealthGen - Self-reported rating of participant’s health in general Reported for participants aged 12 years or older. One of Excellent, Vgood, Good, Fair, or Poor.)
I’ve tried to break down the process of creating the test statistic, but the syntax is slighlty different from what we would do in the actual tidy pipeline. It might be easier to understand the calculation of the observed test statistic two slides forward.
If the null hypothesis is true, the labels assigning groups are interchangeable with respect to the probability distribution.
typically (with the two group setting),
\[H_0: F_1(x) = F_2(x)\]
(there are no distributional or parametric conditions)
Exchangeability
More generally, we might use the following exchangeability definition
Data are exchangeable under the null hypothesis if the joint distribution from which the data came is the same before permutation as after permutation when the null hypothesis is true.
Probability as measured by what?
Random Sample The concept of a p-value usually comes from the idea of taking a sample from a population and comparing it to a sampling distribution (from many many random samples).
Randomized Experiment The p-value represents the observed data compared to the treatment variable being allocated to the groups “by chance.”
Permuting independent observations
Consider a “family” structure where some individuals are exposed and others are not (control).
Permuting homogenous cluster
Consider a “family” structure where individuals in a cluster always have the same treatment.
Permuting herterogenous cluster
Consider a “family” structure where individuals in a cluster always have the opposite treatment.
Want to know if the population average score for the perceived gender is different.
\[H_0: \mu_{ID.Female} = \mu_{ID.Male}\]
Although for the permutation test, under the null hypothesis not only are the means of the population distributions the same, but the variance and all other aspects of the distributions across perceived gender.
Conceptually, there are two levels of randomization:
\(N_m\) students are randomly assigned to the male instructor and \(N_f\) are assigned to the female instructor.
Of the \(N_j\) assigned to instructor \(j\), \(N_{jm}\) are told that the instructor is male, and \(N_{jf}\) are told that the instructor is female for \(j=m,f\).
Fit the original model and obtain coefficient estimates \((b_{0\cdot 1,2},\)\(b_{1\cdot 2},\) and \(b_{2\cdot 1})\) and corresponding standard error estimates (\(SE(b_{0\cdot 1,2}),\)\(SE(b_{1\cdot 2}),\) and \(SE(b_{2\cdot 1})):\)\[\widehat{Y} = b_{0\cdot1,2} + b_{1\cdot2}X_1 + b_{2\cdot1}X_2\]
Permute \(Y\) to obtain \(Y^*\).
Fit a model on the permuted \(Y^*\) values to obtain permuted coefficient estimates (\(b^*_{0\cdot 1,2}\), \(b^*_{1\cdot 2}\), and \(b^*_{2\cdot 1}\)) and corresponding standard error estimates (\(SE(b^*_{0\cdot 1,2})\), \(SE(b^*_{1\cdot 2})\), and \(SE(b^*_{2\cdot 1})\)): \[\widehat{Y}^* = b^*_{0\cdot1,2} + b^*_{1\cdot2}X_1 + b^*_{2\cdot1}X_2\]
Repeat steps 2 and 3 \(P\) times. For example, \(P\) = 1000.
From the \(P\) copies of \(b^*_{1\cdot 2}\) and \(P\) copies of \(SE(b^*_{1\cdot 2})\), calculate \(P\) copies of \(t^*\) to form the permuted null sampling distribution: \[\begin{equation}t^* = \frac{b_{1\cdot2}^* - 0}{SE(b_{1\cdot2}^*)} \label{t_y} \end{equation}\]
Compare the observed test statistic to the permuted null sampling distribution from step 5: \[t_{obs} = \frac{b_{1\cdot2} - 0}{SE(b_{1\cdot2})}\]
Permuting \(Y\) - consequences
Indeed, permuting \(Y\) will break the relationship between \(Y\) and \(X_1\), which will force the null hypothesis to be true (which is what we want for testing).
However, permuting \(Y\) will also simultaneously break the relationship between \(Y\) and \(X_2,\) which may not be acceptable if we need to preserve the relationship to mirror the original data structure.
Permuting \(X_1\) - algorithm
Fit the original model and obtain coefficient estimates and corresponding standard error estimates: \[\widehat{Y} = b_{0\cdot1,2} + b_{1\cdot2}X_1 + b_{2\cdot1}X_2\]
Permute \(X_1\) to obtain \(X^*_1\).
Fit a model on the permuted \(X_1^*\) values to obtain permuted coefficient estimates and corresponding standard error estimates: \[\widehat{Y} = b^*_{0\cdot1,2} + b^*_{1\cdot2}X^*_1 + b^*_{2\cdot1}X_2\]
Repeat steps 2 and 3 \(P\) times.
From the \(P\) copies of \(b_{1\cdot2}^*\) and \(P\) copies of \(SE(b_{1\cdot2}^*)\), calculate \(P\) copies of \(t^*\) to form the permuted null sampling distribution: \[\begin{equation}t^* = \frac{b_{1\cdot2}^* - 0}{SE(b_{1\cdot2}^*)} \label{t_x1} \end{equation}\]
Compare the observed test statistic to the permuted null sampling distribution from step 5: \[t_{obs} = \frac{b_{1\cdot2} - 0}{SE(b_{1\cdot2})}\]
Permuting \(X_1\) - consequences
The permutation distribution created from permuting \(X_1\) will force the null hypothesis to be true.
However, permuting \(X_1\) has the side effect that the relationship between \(X_1\) and \(X_2\) will be broken in the permuted data.
If the data come from, for example, a randomized clinical trial (where \(X_1\) is the treatment variable), then \(X_1\) and \(X_2\) will be independent in the original dataset, and permuting of \(X_1\) will not violate the exchangeability condition.
If \(X_1\) and \(X_2\) are correlated in the original dataset, then permuting \(X_1\) violates the exchangeability condition.
Permuting reduced model residuals - algorithm
Fit the original model on \(X_2\) only and obtain coefficient estimates and corresponding standard error estimates of the reduced model: \[\widehat{Y} = b_{0\cdot2} + b_{2}X_2\]
Let the residuals \(R_{Y\cdot2} = Y - b_{0\cdot2} - b_{2}X_2\), and permute \(R_{Y\cdot2}\) to obtain \(R^*_{Y\cdot2}.\) Define the permuted outcome variable as \(Y^* = b_{0\cdot2} + b_{2}X_2 + R^*_{Y\cdot2}.\)
Fit a model on the permuted \(Y^*\) values to obtain permuted coefficient estimates and corresponding standard error estimates: \[\widehat{Y}^* = b^*_{0\cdot1,2} + b^*_{1\cdot2}X_1 + b^*_{2\cdot1}X_2\]
Repeat steps 2 and 3 \(P\) times.
From the \(P\) copies of \(b_{1\cdot2}^*\) and \(P\) copies of \(SE(b_{1\cdot2}^*)\), calculate \(P\) copies of \(t^*\) to form the permuted null sampling distribution: \[\begin{equation}t^* = \frac{b_{1\cdot2}^* - 0}{SE(b_{1\cdot2}^*)} \label{t_red} \end{equation}\]
Compare the observed test statistic to the permuted null sampling distribution from step 5: \[t_{obs} = \frac{b_{1\cdot2} - 0}{SE(b_{1\cdot2})}\]
Permuting reduced model residuals - consequences
The permutation preserves the relationship between \(X_1\) & \(X_2\) as well as the relationship between \(X_2\) & \(Y\).
However, in order for the relationship between \(X_1\) & \(Y\) to be broken (i.e., to obtain a null sampling distribution for the test of \(H_0: \beta_{1\cdot2} = 0),\)\(X_1\) and \(X_2\) must not be associated.
Relationships
Permutation
Broken Relationships
Preserved Relationships
Permute \(Y\)
\(X_1\) & \(Y\)
\(X_1\) & \(X_2\)
\(X_2\) & \(Y\)
Permute \(X_1\)
\(X_1\) & \(X_2\)
\(X_2\) & \(Y\)
\(X_1\) & \(Y\)
Permute reduced model residuals
\(X_1\) & \(Y\) (if \(X_1\) & \(X_2\) are uncorrelated)